Resource Constrained Exploration in Reinforcement Learning
نویسندگان
چکیده
This paper examines temporal difference reinforcement learning (RL) with adaptive and directed exploration for resource-limited missions. The scenario considered is for an energy-limited agent which must explore an unknown region to find new energy sources. The presented algorithm uses a Gaussian Process (GP) regression model to estimate the value function in an RL framework. However, to avoid myopic exploration we developed a resource-weighted objective function which combines an estimate of the future information gain using an action rollout with the estimated value function to generate directed explorative action sequences. The results show that under this objective function, the learning agent is able to continue exploring for better state-action trajectories when platform energy is high and follow conservative energy gaining trajectories when platform energy is low.
منابع مشابه
Cost-Sensitive Exploration in Bayesian Reinforcement Learning
In this paper, we consider Bayesian reinforcement learning (BRL) where actions incur costs in addition to rewards, and thus exploration has to be constrained in terms of the expected total cost while learning to maximize the expected longterm total reward. In order to formalize cost-sensitive exploration, we use the constrained Markov decision process (CMDP) as the model of the environment, in ...
متن کاملLearning to soar: Resource-constrained exploration in reinforcement learning
This paper examines temporal difference reinforcement learning with adaptive and directed exploration for resourcelimited missions. The scenario considered is that of an unpowered aerial glider learning to perform energy-gaining flight trajectories in a thermal updraft. The presented algorithm, eGP-SARSA(l), uses a Gaussian process regression model to estimate the value function in a reinforcem...
متن کاملConstrained Bayesian Reinforcement Learning via Approximate Linear Programming
In this paper, we consider the safe learning scenario where we need to restrict the exploratory behavior of a reinforcement learning agent. Specifically, we treat the problem as a form of Bayesian reinforcement learning in an environment that is modeled as a constrained MDP (CMDP) where the cost function penalizes undesirable situations. We propose a model-based Bayesian reinforcement learning ...
متن کاملRTP-Q: A Reinforcement Learning System with Time Constraints Exploration Planning for Accelerating the Learning Rate
Reinforcement learning is an efficient method for solving Markov Decision Processes that an agent improves its performance by using scalar reward values with higher capability of reactive and adaptive behaviors. Q-learning is a representative reinforcement learning method which is guaranteed to obtain an optimal policy but needs numerous trials to achieve it. k-Certainty Exploration Learning Sy...
متن کاملAdaptable bandwidth planning using reinforcement learning
In order to improve the bandwidth allocation considering feedback of operational environment, adaptable bandwidth planning based on reinforcement learning is proposed. The approach is based on new constrained scheduling algorithms controlled by reinforcement learning techniques. Different constrained scheduling algorithms,, such as “conflict free scheduling with minimum duration”, “partial disp...
متن کامل